Selecting optimal image from mobile device captures

ABSTRACT

Embodiments of the disclosed technologies include a method of capturing, using a mobile device, a best-focused image of a skin surface of a subject, the method including: setting a camera of the mobile device to a fixed focal length; capturing, using the camera, a current image of a plurality of images of the skin surface, the plurality of images having a sequence and including a first previous image captured, using the camera, previously to the current image and a second previous image captured, using the camera, previously to the first previous image; producing a modified image from the current image; transforming the modified image, using a Laplacian pyramid, to produce a plurality of first luminance values from the modified image and a plurality of second luminance values from the plurality of first luminance values; averaging a plurality of first squared values, each including a square of a corresponding first luminance value of the plurality of first luminance values, to produce a first energy value; averaging a plurality of second squared values, each including a square of a corresponding second luminance value of the plurality of second luminance values, to produce a second energy value; calculating a first ratio of the first energy value to the second energy value; calculating, as an average first energy value of the first previous image, an average of the first energy value, a corresponding first energy value of the first previous image, and a corresponding first energy value of the second previous image; calculating, as an average first ratio of the first previous image, an average of the first ratio, a corresponding first ratio of the first previous image, and a corresponding first ratio of the second previous image; determining that the first previous image is one of a plurality of valid images, where each valid image of the plurality of valid images is an image of the plurality of images and has: a corresponding average first energy value above an energy threshold value; and a corresponding average first ratio approximately equal to 1.0; determining that a first valid image of the plurality of valid images is the best-focused image, where the first valid image has a corresponding average first energy value that is greater than the corresponding average first energy values of: a previous valid image captured immediately before the first valid image; and a subsequent valid image captured immediately after the first valid image; and performing an action associated with the best-focused image.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application claims the benefit of and priority to U.S. patentapplication Ser. No. 15/573,325, filed Jan. 19, 2016, entitled“SELECTING OPTIMAL IMAGE FROM MOBILE DEVICE CAPTURES”, which claims thebenefit of and priority to U.S. Prov. Pat. App. Ser. No. 62/161,318,filed May 14, 2015 and entitled “SURFACE METROLOGY, ILLUMINATIONCONTROL, AND DIALOG-DRIVEN IMAGE CAPTURE USING A SMARTPHONE-LIKE DEVICE”and U.S. Prov. Pat. App. Ser. No. 62/222,897, filed Sep. 24, 2015,entitled “METHOD OF BEST CAPTURE OF CLOSE-UP IMAGES USING MOBILEPHONES,” each of which is incorporated herein by this reference in itsentirety.

BACKGROUND

Ensuring health and well-being is important for life. Image processingis used more and more in health care for diagnosis and monitoring. Forexample, the human skin can be an indicator of an individual'swell-being, in addition to being the literal face that we present to theworld. Physical characteristics of the skin, such as such as the textureof one's complexion, severity and changes in a rash, bump, moles,hydration level, etc., all can provide insight into an individual'swell-being. Yet, these characteristics are not yet widely utilized, asprecise measurement typically requires specialized custom built imagingsystem or a visit to an expert. The specialized systems are expensiveand bulky, as they need to operate under controlled lighting,appropriate camera angle, and proper distance from the object ofinterest. Visiting an expert's office is time consuming and costly.Neither of these approaches lends itself to personal daily use.

It is very attractive to use the cell phone to capture these images andprovide the same functions of feature identification, e.g., forcosmetics guidance, diagnostics, or product identification. However, thecell phone is hand-held and human hands are shaky; the captured staticimage is often blurred and out of focus. Further, a subjectphotographing his own face cannot see the facial skin image at the timethe image is taken, and therefore cannot determine if the camera isproperly oriented and the image is in focus.

BRIEF DESCRIPTION OF DRAWINGS

This disclosure is illustrated by way of example and not by way oflimitation in the accompanying figures. The figures may, alone or incombination, illustrate one or more embodiments of the disclosure.Elements illustrated in the figures are not necessarily drawn to scale.Reference labels may be repeated among the figures to indicatecorresponding or analogous elements.

FIG. 1 is a diagram of an example system configured to identify abest-focused digital photograph from a plurality of digital photographscaptured by a mobile device camera;

FIG. 2 is a diagram of an exemplary progression of image processingstages executed by a processor in accordance with the presentdisclosure;

FIG. 3 is a diagram of another exemplary progression of image processingstages executed by a processor in accordance with the presentdisclosure;

FIG. 4 is a graph plotting computed energy values and ratios thereof fora series of images, in accordance with the present disclosure;

FIG. 5 is a graph plotting computed energy values and ratios thereof foranother series of images, in accordance with the present disclosure;

FIG. 6 is a flowchart of an exemplary method of determining abest-focused image from a plurality of images, in accordance with thepresent disclosure;

FIG. 7 is a diagram of a communication network in accordance with thepresent disclosure;

FIG. 8 is a flowchart of an exemplary method of analyzing a user concernusing a virtual personal assistant, in accordance with the presentdisclosure; and

FIG. 9 is a schematic diagram of electrical components for a mobiledevice m accordance with the present disclosure.

DETAILED DESCRIPTION

While the concepts of the present disclosure are susceptible to variousmodifications and alternative forms, specific embodiments thereof areshown by way of example in the drawings and are described in detailbelow. It should be understood that there is no intent to limit theconcepts of the present disclosure to the particular forms disclosed. Onthe contrary, the intent is to cover all modifications, equivalents, andalternatives consistent with the present disclosure and the appendedclaims.

The present disclosure provides systems, device configurations, andprocesses for using a smartphone or other mobile device with aconsumer-grade camera to determine, from a video or a series of digitalphotos, a best-focused image of a textured surface or other region ofinterest. While high in resolution, images produced by consumer-gradecameras have limited quality due to the selection of cost-effectivelenses, image sensors, processing hardware, and other electroniccomponents. Images are often out of focus, even when the correspondingphotograph is captured at the proper focal length. Images can containcapture and/or processing artifacts such as additive white Gaussiannoise, spectral highlights, and other anomalies that occupy some or allspatial frequencies of the image, which confuses and degrades theoperation of known focus detection methods. Embodiments herein describeapplying a pyramidal transform to an image, determining the Laplacianenergies of the levels in the resulting Laplacian pyramid, anddetermining properties such as focus and noise content of the image fromratios between the Laplacian energies. The best-focused image of aseries of related images can then be determined by comparing theLaplacian energy properties of the images on a rolling basis.

Referring to FIG. 1, a mobile device 100 adapted for performing thepresent image identification processes may be a laptop computer, tabletcomputer, e-reader, smartphone, personal data assistant, or other mobiledevice; in other embodiments, the present systems and processes may beimplemented in an immobile or less-mobile computing device such as apersonal computer, set-top box, digital media player, microconsole, homeautomation systems, or other computing device. The mobile device 100 mayinclude a processor 102, which may be a central processing unit (CPU),microprocessor, or other suitable processor that can control a camera120 and employ various components of the mobile device 100 to enablecapture of a plurality of photographs of the region of interest 110 andstore them temporarily as a plurality of images 114 (i.e., image files)in a data store 112. As used herein, a data store may be any repositoryof information that is or can be made freely or securely accessible bythe mobile device 100. Suitable data stores include, without limitation:databases or database systems, which may be a local database, onlinedatabase, desktop database, server-side database, relational database,hierarchical database, network database, object database,object-relational database, associative database, concept-orienteddatabase, entity-attribute-value database, multi-dimensional database,semi-structured database, star schema database, XML database, file,collection of files, spreadsheet, or other means of data storage locatedon a computer, client, server, combination of any number of servers, orany other data storage device and/or data storage media, in anystandard, distributed, virtual, or clustered environment known in theart or developed in the future; file systems; and electronic files suchas web pages, spreadsheets, and documents. An image file has a filenameand contains data representing an image, and may contain other data aswell, such as a time, a date, and a location of capture, a device usedto capture, settings of the capture device at the time the image wascaptured, an image histogram, and the like.

The processor 102 executes device logic 104 in the processor 102 or inmemory of the device 100 to process the plurality of images 114, andfurther, in some embodiments, to aid a user of the mobile device 100 toposition, move, and otherwise control the mobile device 100 so that theplurality of images 114 include the region of interest 110 and at leastthe best-focused captured image 116 is of a desired quality. In oneexample of aiding the user, the user may be the subject 108 himself, andthe region of interest 110 may be the skin of the subject's 108 cheek,as ill lustrated. In this case, whether the camera 120 is on the sameside of the mobile device 100 as the display screen 106, or on theopposite side, the subject 108 cannot see the display 106 during imagecapture. The processor 102 may send audible cues to the subject 108 viaa speaker 122 or another output device; such audible cues can includealerts and/or speech providing guidance such as, without limitation,informing that the region of interest 110 is not in the camera 120 fieldof view, informing that the subject 108 is moving the mobile device 100too fast or too slow, informing that the camera is tilted too much withrespect to the skin surface, informing that the ambient light level istoo low, or indicating that a best-focused image has been captured andidentified.

The mobile device 100 may include one or more cameras 120, which may beany suitable image capture device that can be integrated in or connectedto the mobile device 100. While cameras 120 of any level of qualityand/or sophistication may be used, including digital single-lens reflexcameras and mirrorless system cameras, the processes described below areparticularly configurable to be applied in consumer-grade smartphone(e.g., APPLE IPHONE, SAMSUNG GALAXY, etc.) and tablet (e.g., APPLE IPAD,AMAZON KINDLE FIRE, etc.) environments, wherein consumer-grade cameras120 are nearly universally installed. Exemplary embodiments aredescribed in which the focal length of the camera 120 may be fixedduring image capture. Some consumer-grade cameras do not have adjustablelenses that provide a range of focal lengths; the present approaches aresuited for such devices. Most current smartphones, however, are not onlyequipped with adjustable lenses, but also employ autofocus motors and/orsoftware operating in conjunction with a range sensor to move the lensand simplify photography. Unfortunately, autofocus algorithms are nottuned to acquire the finest-detailed close-up images of texturedsurfaces such as skin. The present approaches operate on such devices bydisabling or deactivating autofocus and fixing the focal length at anoptimal position.

In some embodiments, the mobile device 100 may advantageously capturethe plurality of images 114 sequentially (i.e., in a chronologicalsequence or order), as a video comprising frames that are the pluralityof images, or using a native rapid-acquisition mode such as burst orcontinuous high-speed shooting mode, or from repeated manual actuationof a camera shutter, or using another camera control algorithm describedherein. The plurality of images 114 may be captured as a user is movingthe mobile device 100, either intentionally on a path or accidentallysuch as by dropping or shaking the mobile device 100. Components such asrange finders, accelerometers, and other sensors, camera flashes 118 andother lights, display screens 106, global positioning systems, networkconnections, and software modules such as image recognitionapplications, may all be employed to guide positioning and movement ofthe mobile device 100, activation of the shutter, illumination of thesubject 108, and the like.

The systems, devices, and methods are described as operating on aplurality of images 114 that are strongly and clearly related, such thatthe best-focused image 116 will be stored and/or acted upon, and theothers expectedly disposed of Examples include series of images that arerelated in both time and subject matter as described above, as well asimages taken of the same subject but at different times, such as beforeand after photographs of treated skin. Processes are described below forre-capturing images when the plurality of images 114 are notsufficiently related, or when even the best-focused image 116 is stillblurry. Presuming a successful capture of a plurality of related images114, the processes for identifying the best-focused image 116 are nowdescribed with further reference to the figures.

Referring again to FIG. 1, the processor 102 makes determinationsregarding the focus of an image 130 based on pixel data of some or allof the pixels 132 of the image 130. In particular, the processor 102applies a discrete Laplace operator iteratively to the image 130,forming what is known as a Laplacian pyramid. The discrete Laplaceoperator is a well-established mathematical operation that transforms afirst function of a real variable, such as space, into a second functionsuch as spatial frequency. The Laplacian pyramid representation hasknown applications in image processing, including edge detection,blurring, and sharpening; a processor convolves the image with one ormore filtered versions of the image produced by the discrete Laplaceoperator to perform these tasks. A Laplacian pyramid, for imageprocessing, may be considered a spatial bandpass or highpassrepresentation that transforms the image from the spatial domain to thespatial frequency domain. The Laplacian pyramidal transform uses akernel (the discrete Laplace operator) to control its output, the kernelhaving parameters including a filter size (a two-dimensional matrixwherein the elements represent pixels, the kernel giving weights to thepixel being filtered and its adjacent pixels within the kernel area), acutoff frequency, and, optionally, a number of iterations. Typically,the Laplacian pyramidal transform operates on a grayscale version of theimage 130, using only the luminance of each pixel 132; thus, the spatialfrequency components of the Laplacian pyramidal transform pertain to thebrightness of areas of the image 130.

Applying the discrete Laplace operator iteratively to the image 130generates a Laplacian pyramid 140 having one or more levels. The levelscontain spatial frequency information for areas of the image 130. Thespatial frequency information represents an amount that the luminance ofthe image 130 varies over a distance set by the filter size. A firstlevel 142 of the pyramid 140 (referred to as LO commonly and herein)contains a luminance value 152 for each pixel 132 of the image 130. Theluminance value 152 represents a frequency of the luminance of thecorresponding pixel 132 in light of the Laplacian pyramidparameters—that is, the spatial frequency is calculated across thepixels 132 within the filter matrix, the pixel 132 being filtered placedat the center of the matrix, and the spatial frequency is filtered outif it does not exceed the cutoff frequency. To illustrate the pyramid140, the first level 142 can be convolved back to the spatial domain,where it is represented by a high-pass filtered image of the same sizeas the original image 130; the luminance values 152 correspond to theluminance of the pixels in the high-pass filtered image.

The discrete Laplace operator computes a spatial derivative of eachlevel to produce the next level in the pyramid 140, reducing the size(i.e., the number of luminance values) in each subsequent level by halfThus, a plurality of first luminance values 152 of the first level 142are downsampled into one second luminance value 154 of the second level144 (referred to as L1 commonly and herein), a plurality of secondluminance values 154 are downsampled into one third luminance value 156of the third level 146 (referred to as L2 commonly and herein), and soon until a top level 148 is produced and contains just one luminancevalue. The frequencies filtered out between levels are modified by thederivative, such that each “distance” between levels corresponds to aspatial bandpass filter, the width of each frequency band depending onthe resolution of the image. The number of levels 142-148 may be limitedby setting a maximum number of iterations of the Laplacian pyramid. Theprocessor 102 uses the corresponding luminance values of one or more ofthe levels to derive the focus information.

Referring to FIG. 2, a processor of the mobile device may execute devicelogic, including program instructions, software modules, hardwareinterfaces, and the like, to execute the image processing in aprogression 200 of stages. The image capture stage 202 includesreceiving one or more of the plurality of images to be compared, theimages captured by the camera of the mobile device. In otherembodiments, the images may be captured by multiple devices and sent tothe processor. Various techniques for controlling the camera to capturethe proper images are described below. In particular, at stage 202 theprocessor may fix the focal length of the camera before the images arecaptured. In some embodiments, the processor may receive all of theplurality of images at once and process them through the progression 200accordingly. In other embodiments, the processor may receive each imageas it is captured, and may process the image through the progression 200immediately, even while subsequent images are still being capturedand/or received.

The image crop stage 204 includes cropping the image to a reduced size.The size may be selected to make the images uniform. The crop may be toa uniform or non-uniform size and the crop region may be located overthe region of interest in the image. Cropping may be performed in thecenter of the image, or may be offset spatially as determined by otherfactors, such as degree of brightness, hue, specularity or saturatedvalues in the captured image. The image crop stage 204 may furtherinclude rotating the image and/or performing other image modifications.

The luminance extraction stage 206 includes converting the color imageto a grayscale image for processing by the Laplacian pyramid. Only theluminances of the pixels in the image are needed. Any suitableconversion technique may be used to preserve the luminance and removethe color in the image. For example, the image may be represented by aluminance channel and a chrominance channel, with channel informationderived for each pixel from the pixel's RGB value, and the processor mayextract the luminance channel to create a new grayscale image.

The pyramid generation stage 208 includes transforming the grayscaleimage, using an iterative discrete Laplace operator, to produce aLaplacian pyramid for the image, the pyramid including at least a firstlevel based on the grayscale image and a second level based on the firstlevel, as described above. In an exemplary embodiment, the discreteLaplace operator uses a five-pixel by five-pixel matrix as the filtersize. This filter size produces a reasonable compromise betweencomputational efficiency and sharpness of bandpass cutoff frequency;other filter sizes may be used. Applying the discrete Laplace operatorthus produces a plurality of first luminance values (i.e., LO Laplacianvalues 280) from the grayscale image and a plurality of second luminancevalues (i.e., LI Laplacian values 282) from the plurality of firstluminance values.

The energy calculation stage 210 includes determining energy valuesrepresenting the Laplacian energy of each level (i.e., a first energyvalue 284 for LO and a second energy value 286 for LI) of the pyramid.The Laplacian energy is a measurement of the frequency response acrossthe entire level (i.e., within the frequency band represented by thelevel): Laplacian energy is relatively high when spatial frequencycomponents in the original luminance image have large amplitudes withinthe bandpass region of the level, which in turn correspond to largeluminance variations in the corresponding area of the transformed image.In some embodiments, the energy value of a level is determined bycomputing the square of each luminance value, and then computing themean/average of the intermediate (i.e., squared) values to produce theenergy value.

The energy values can be used directly to select focused images. Forexample, in a plurality of images, the image wherein all of the energyvalues are at their peak values may have the best focus. However, thisapproach is locally myopic: even though the energy values are peaking,the energy value of one of the levels may be significantly higher thanthe others. This is particularly shown in images of a skin surface thatexhibits “weak” features. Weak features are features that, in the image,are not significantly differentiated from the base appearance of theskin and/or from other features; that is, the base appearance of theskin forms a “background” color and texture of the image, and the weakfeatures, such as wrinkles, furrows, pores, slight imperfections, andthe like, do not stand out from the background. The images, includingblurred images, of weak features may be dominated by image noise, suchas additive white Gaussian noise, which is recorded as frequencyvariations in LO but filtered out in subsequent levels. Correspondingly,the energy value of LO is significantly higher than the energy value ofLI. If the images are being processed as they are captured, a local peakof the LO and LI energy values can identify a blurred image as thebest-focused image, when the best-focused image has actually not yetbeen captured.

A threshold comparison stage 212 accounts for such falseidentifications. The processor determines whether the energy values areabove an energy threshold set to eliminate images that are too blurry.The processor also determines whether a ratio of the LO energy value tothe LI energy value is approximately 1.0, i.e., the energy values areapproximately equal. A range around 1.0 may be used to determine theextent of “approximately” equal. In one example, the ratio must be atleast 0.5 and at most 1.9 to be considered approximately 1.0. The rangemay depend on the characteristics of the camera, such as focal length,image resolution, shutter speed, capture rate, etc. If the image meetsthese conditions, the processor may retain the image and its energyvalues for further processing; in various embodiments, the processor mayalso retain the images that do not meet the thresholds and their energyvalues, or just their energy values, or may discard such images andenergy values.

These conditions may correspond to a global best-focused image, becausethe energy values reach their peak when there is the most frequencycontent (i.e., the image is sharpest), and the image is not dominated bynoise when the energy values are approximately equal. In contrast, anear-blurred image, captured when the mobile device is separated fromthe subject by a distance that is less than the focal length, has arelatively high ratio because the LO energy value is much higher thanthe LI energy value; a far-blurred image, captured when the mobiledevice is separated from the subject by a distance that is greater thanthe focal length, has energy values that quickly fall below the energythreshold. Thus, in some embodiments, the processor may identify all ofthe “valid” images in the plurality of images by determining that thevalid images have a first (i.e., LO) energy value that exceeds theenergy threshold and is approximately equal to the image's second (i.e.,LI) energy value. The processor may then identify the valid image withthe highest first energy value as the best-focused image.

Referring again to FIG. 2, the illustrated embodiment may account forcontinuously received images in the plurality of images in a temporalfiltering stage 214. This stage 214 may calculate a rolling or movingaverage across a predetermined number of the most recently receivedframes (i.e., images), such as by applying a boxcar filter across thelast three received images—the current (i.e., last-received) image, afirst previous image occurring in the image sequence immediately beforethe current image, and a second previous image occurring in the imagesequence immediately before the previous image (alternatively, theseimages may be referred to as a first image, a previous image occurringin the image sequence immediately before the first image, and asubsequent image occurring in the image sequence immediately after thefirst image). The temporal filtering stage 214 may extend the thresholdcomparison of the previous stage 212 to apply to an average of thevalues of the images in the boxcar filter.

Thus, the processor may calculate an average first energy value from thefirst energy values of the current, first previous, and second previousimages, and may calculate an average second energy value from the secondenergy values of the current, first previous, and second previousimages. The processor determines whether the average energy values areabove the energy threshold and are approximately equal (i.e., the ratioof the average first energy value to the average second energy value isapproximately 1.0). If so, the processor may add the first previousimage (which is at the center of the boxcar filter—note that the currentimage has no subsequent image yet) and the average energy values to aqueue for valid images whose average energy values meet the thresholds.The queue may have its sequence, which takes the image order of theplurality of images but removes the images that are determined to not bevalid. The queue may also have a minimum queue size corresponding to thenumber of images that the processor must identify as valid before it canproceed to the next stage 216.

The best focus selection stage 216 includes determining the best-focusedimage of the plurality of images from the valid images in the queue. Inone embodiment, such as when the camera has stopped capturing theplurality of images, the processor may select the valid image with thehighest average first energy value as the best-focused image. In anotherembodiment, the camera may still be capturing images for processing, yetthe best-focused image may be determined: the processor may determinewhether any of the valid images has an average first energy value thatis higher than those of the images immediately before and immediatelyafter the valid image in the queue. If so, the processor may select thatvalid image as the best-focused image. Furthermore, the processor maycontrol the camera to stop capturing the plurality of images.

The best-focused image is thus selected, and the processor may performone or more actions associated with the best-focused image. Non-limitingexamples include terminating image capture, deleting the video or allother images of the plurality of images, providing an alert or indicatorto the user or to another device that the best-focused image has beenobtained, presenting the best-focused image on the display of the mobiledevice, storing the best-focused image or sending it to a remote devicefor storage or processing, and the like. The actions may includeprocessing the image according to a particular desired application. Forexample, where the images are of a skin surface, the best-focused imagemay be used in a cosmetic, dermatological, or other aesthetic or medicalapplication. Such applications include determining conditions of theskin. In one embodiment, the processor may execute additional devicelogic to analyze skin features in the image. Non-limiting examples ofthe analysis include: determining a skin moisture content, skin texture,or skin color; identifying the presence or absence of acne or rosacea;comparing the image to previously captured images to estimate treatmentprogress, wound healing, advancement of wrinkles and/or furrows, colorchanges, and the like. Other suitable applications in which abest-focused image of a mobile device may be acted upon include imagesensing, applied physics applications, robotics applications, and otherapplication in which finely detailed, close-up images of a texturedsurface are used.

FIG. 3 illustrates another embodiment of executing the image processingin a progression 300 of stages. The processor follows the progression300 to utilize an additional level of the Laplacian pyramid, which addscomplexity and thus processing time and resource overhead, but may moreaccurately produce the best-focused image when compared to theprogression 200 of FIG. 2. The image capture stage 302, image croppingstage 304, and luminance extraction stage 306 proceed as described abovewith respect to stages 202-206 of FIG. 2. At the pyramid generationstage 308, the processor applies the discrete Laplace operator describedabove with respect to stage 208 to generate a Laplacian pyramid for thegreyscale image. The Laplacian pyramid includes at least a first levelbased on the grayscale image, a second level based on the first level,and a third level based on the second level. As in stage 208, the levelshave corresponding pluralities of luminance values representing theLaplacian values of the corresponding levels (i.e., LO Laplacian values380, LI Laplacian values 382, and L2 Laplacian values 384).

The energy calculation stage 310 includes determining energy valuesrepresenting the Laplacian energy of each level (i.e., a first energyvalue 386 for LO, a second energy value 388 for LI, and a third energyvalue 390 for L2) of the pyramid. Again, the energy values 386-390 canbe used directly to select focused images. With respect to the weakfeatures described above, the L2 energy value behaves similarly to theL2 energy value, being dominated by the energy value of LO innear-blurred images, and quickly dropping below the energy threshold infar-blurred images.

A threshold comparison stage 312 includes determining whether the energyvalues are above the energy threshold. The processor also determineswhether a ratio of the LO energy value to the LI energy value and aratio of the LO energy value to the L2 energy value are eachapproximately 1.0, i.e., the energy values are approximately equal. Atemporal filtering stage 314 may include calculating a rolling or movingaverage across a predetermined number of the most recently receivedframes (i.e., images). The processor may calculate an average firstenergy value from the first energy values of the current, firstprevious, and second previous images, an average second energy valuefrom the second energy values of the current, first previous, and secondprevious images, and an average third energy value from the third energyvalues of the current, first previous, and second previous images. Theprocessor determines whether the average energy values are above theenergy threshold and are approximately equal (i.e., the ratios of theaverage first energy value to the average second energy value and to theaverage third energy value are approximately 1.0). If so, the processormay add the first previous image and the average first energy value to aqueue for valid images whose average energy values meet the thresholds,as described above.

The best focus selection stage 316 proceeds as described above withrespect to stage 216 of FIG. 2. The accuracy may be improved over thefaster progression 200 of FIG. 2 due to the additional check of theenergy and ratio corresponding to the third level of the Laplacianpyramid. FIGS. 4 and 5 plot the energy values and ratios for twoexemplary image captures, in which the camera was moved from near theskin to far from the skin as 37 images were captured. The plotsdemonstrate that the energy values of LO, LI, and L2 peak for thesharpest images 402, 502, while the ratios LO/LI and LO/L2 remain atapproximately 1.0. The plots also demonstrate the dominance of the LOenergy when the camera is very close to the subject.

The correlative rising, peaking, and falling of the energy values, likethose shown in the plots, correspond to the mobile device approachingand then passing a distance from the subject that is equal to the fixedfocal length of the camera. FIG. 6 illustrates a method 600 ofdetermining the best-focused image when the plurality of images iscaptured automatically as the mobile device is moved from a point withinthe focal distance from the subject to a point outside the focaldistance from the subject. At step 601, the processor may set the focallength of the camera to a fixed position. Advantageously, the fixedposition may be a minimum focal length achievable by the camera, as thisensures that the best-focus image captures the finest details of thesurface. At step 602, the processor may receive one of the plurality ofimages. Receiving the image may comprise controlling the camera tocapture the image. The processor may initiate the capture upon receivingan input signal, such as a user input actuating the shutter, a signalfrom a range sensor or accelerometer, or another input. The processormay control the camera to capture an image at a constant capture rate,such as a number (e.g., 5-15) of frames per second, and thus mayautomatically actuate the next capture according to a system clock.

At step 604, the processor may, optionally, crop the image to a range ofinterest or to a desired size. At step 606, the processor may produce agrayscale image from the cropped image. At step 608, the processor mayapply the Laplacian operator iteratively to the grayscale image toproduce the luminance values of the Laplacian pyramid as describedabove. At step 610, the processor may calculate the energy value foreach relevant level of the Laplacian pyramid. At step 612, the processormay calculate the average energy values for the immediately previousimage, or for the current image if there is no previous image, acrossthe current image and one or more previous images, if any. At step 614,the processor may determine whether the previous image is a valid image,i.e., that the previous image's average energy values exceed the energythreshold and are approximately equal. If not, at step 616 the processormay save the energy values for the image and return to step 602 tocapture the next image.

If the previous image is a valid image (step 614), at step 618 theprocessor may push the previous image to the valid image queue. At step620, the processor determines whether a minimum number of valid imagesare in the queue. If not, the processor returns to step 602 to capturethe next image. If so, at step 622 the processor compares the averagefirst energy value of one or more of the valid frames to thecorresponding average first energy values of the immediately previousand immediately subsequent valid images in the queue. The processor maydo this for all of the valid images, or only for some, or only for thevalid image immediately previous to the previous image in the queue. Ifthe comparison (step 622) does not yield a best- focused image (i.e., avalid image with an average first energy value higher than those of theimmediately previous and immediately subsequent valid images in thequeue), the processor returns to step 602 to capture the next image. Ifa best-focused image is determined, at step 624 the processor controlsthe camera to stop capturing the plurality of images, and at step 626may perform additional actions associated with the best-focused image.

Various steps of the best-focus determination processes, such as method600, may include substeps or companion steps in which the processorcontrols components of the mobile device to interact with the user inorder to facilitate the processes. For example, the processor may, atstep 602, instruct the camera to capture the plurality of images for 4.5seconds at a capture rate of 8 images per second. During the automatedcapture, the processor may receive a signal from an accelerometer of themobile device, and may determine from the signal that the user is movingthe mobile device too fast. The processor may generate an audible,visual, or tactile cue to alert the user to slow down.

To interact with the user, the mobile device may implement a virtualpersonal assistant (VPA) that is configured to perform domain-specifictasks relevant to the capture and processing of the best-focused image.Among others, a suitable VPA architecture that may be implemented on themobile device and configured to operate in the domain is the genericvirtual assistant platform described in U.S. Pat. No. 9,082,402, ownedby the present applicant and incorporated fully herein by reference tothe extent permitted. The generic virtual assistant platform may receivea plug-in comprising at least a domain-specific language model and adomain-specific task flow. The language model may provide a vocabularythat is relevant to communicating the necessary actions to the user andunderstanding the user's inquiries and responses in a conversationalnatural language (e.g., speech-based) manner. The task flow identifiesdomain-specific actions to be taken or initiated by the computingdevice, such as identifying the points in the focus determinationprocess when the processor will have to generate a cue for the user orreceive an input from the user, as well as the points when the processorshould survey components, operations, or data to determine whetherinteraction with the user is needed.

The VPA may advantageously be available at all times. The VPA isdirectly operated by the user and, in some embodiments, does not requireany external support, thus ensuring a high degree of privacy. Referringto FIG. 7, while a local environment completely physically contained onthe computing device 100 may be used, different distributed environmentsmay also be used, as appropriate, to implement various embodiments. Tofacilitate operation of the VPA, the mobile device 100 may communicatewith remote devices via a communication network 702. The network 702 caninclude any appropriate network, including an intranet, the Internet, acellular network, a local area network, a satellite network or any othernetwork and/or combination thereof. Components used for such a systemcan depend at least in part upon the type of network and/or environmentselected. Protocols and components for communicating via such a network702 are well known and will not be discussed in detail. Communicationover the network 702 can be enabled by wired or wireless connections andcombinations thereof. The communication network 702 may facilitatecompletion of tasks by providing a connection to servers 704 and datastores 706 that are remote from the computing device 100. Additionally,the communication network 702 may facilitate receipt by the computingdevice 100 of messages or other VPA data from the other user devices 708and/or servers 704, as well as transmission from the computing device100 of responses to messages, etc.

The mobile device 100 may communicate with one or more servers 704 tosend and receive VPA data, such as updates to the language model and/ortask flow. In particular, one or more of the servers 704 may beconfigured to parse and generate speech, and the mobile device 100 maysend input audio and/or generated cues to the servers 704 fortranslation. The mobile device 100 may also directly access one or moreremote data stores 706 to store and retrieve VPA data. One or more otheruser devices 708 may also communicate via the communication network 702,and may share data that improves VPA operation with the mobile device100, the servers 704, and/or the data stores 706.

There may one server 704 or several cooperating servers 704 ofhomogenous or varying types, layers or other elements, processes orcomponents, which may be chained or otherwise configured, which caninteract with each other and with the mobile device 100 to perform taskssuch as obtaining data from an appropriate data store 706 that isaccessible locally to the cooperating server 704 or remotely over thenetwork 702. Servers, as used, may be implemented in various ways, suchas hardware devices or virtual computer systems. In some contexts,servers may refer to a programming module being executed on a computersystem. The servers 704 can include any appropriate hardware, softwareand firmware for integrating with the data store as needed to executeaspects of one or more applications for the client device, handling someor all of the data access and business logic for an application. Theservers 704 may provide access control services in cooperation with thedata stores 706 and are able to generate content including, text,graphics, audio, video and/or other content usable to be provided to theuser, which may be served to the mobile device 100 in any suitableformat, including HyperText Markup Language (“HTML”), Extensible MarkupLanguage (“XML”), JavaScript, Cascading Style Sheets (“CSS”), or anotherappropriate client-side structured language. Content transferred to themobile device 100 may be processed by the mobile device 100 to providethe content in one or more forms including, forms that are perceptibleto the user audibly, visually and/or through other senses includingtouch, taste, and/or smell. The handling of all requests and responses,as well as the delivery of content between the computing device 100 andthe servers 704, can be handled by one of the servers 704, such as a webserver using PHP: Hypertext Preprocessor (“PHP”), Python, Ruby, Perl,Java, HTML, XML, or another appropriate server-side structured languagein this example. It should be understood that the cooperating servers1604 are not required and are merely example components, as structuredcode discussed can be executed on any appropriate device or host machineas discussed elsewhere. Further, operations described as being performedby a single device may, unless otherwise clear from context, beperformed collectively by multiple devices, which may form a distributedand/or virtual system.

The data stores 706 can include several separate data tables, databases,data documents, dynamic data storage schemes and/or other data storagemechanisms and media for storing data relating to a particular aspect ofthe present disclosure, including without limitation the VPA data andthe image data. It should be understood that there can be many aspectsthat may need to be stored in the data store, such as user informationand access rights information, which can be stored in any appropriatemechanisms in the data stores 706. The data stores 706 may be operable,through logic associated therewith, to receive instructions from theservers 704 and obtain, update, or otherwise process data in responsethereto. The servers 704 may provide static, dynamic or a combination ofstatic and dynamic data in response to the received instructions.Dynamic data, such as data used in web logs (blogs), shoppingapplications, news services and other applications may be generated byserver-side structured languages as described or may be provided by acontent management system (“CMS”) operating on, or under the control of,the servers 704.

Each server 704 typically will include an operating system that providesexecutable program instructions for the general administration andoperation of that server and typically will include a computer-readablestorage medium (e.g., a hard disk, random access memory, read onlymemory, etc.) storing instructions that, when executed by a processor ofthe server, allow the server to perform its intended functions. Suitableimplementations for the operating system and general functionality ofthe servers are known or commercially available and are readilyimplemented by persons having ordinary skill in the art, particularly inlight of the disclosure. The environment, in one embodiment, is adistributed and/or virtual computing environment utilizing severalcomputer systems and components that are interconnected viacommunication links, using one or more computer networks or directconnections. However, it will be appreciated by those of ordinary skillin the art that such a system could operate equally well in a systemhaving fewer or a greater number of components than are illustrated inFIG. 7. Thus, the depiction in FIG. 7 should be taken as beingillustrative in nature and not limiting to the scope of the disclosure.

Referring to FIG. 8, a VPA implemented on the user device may perform amethod 800 of helping the user analyze a health concern, such as a skincondition. The method 800 may be adapted, using the domain-specificmodules of the VPA, to analyze any physical surface that can bephotographed using the mobile device and the processes described herein.At step 802, the VPA may prompt the user, using audible and/or visualcues from the mobile device, to enter input identifying the user'sconcern. At step 804, the VPA may receive the input from the user, andat step 806 the VPA may determine the identified concern from the userinput. In some embodiments, the VPA may determine the concern using oneor a combination of inputs relating to biological and treatmentinformation for the user, such as a set of symptoms, the current productusage, daily treatment procedure, other daily habits, readings from biosensors, and the like. For example, an acne breakout may be due to oneor a combination of factors including hormonal changes, age, oily skin,product side effects, features in the information gathered by the visionsystem, and the like. The VPA system may leverage all the information tonarrow down the concern and the cause.

At step 808, the VPA may determine, based on the condition/concern to beevaluated, one or more parameters for capturing images that pertain tothe concern. Non-limiting examples of such parameters include the targetsurface (e.g., skin), the region of interest (e.g., cheek), any featuresor objects expected to be depicted (e.g., wrinkles, blemishes), and thelike, as well as characteristic values of the parameters in the image(e.g., a threshold sharpness, a minimum color deviation for determiningedges of blemishes). The parameters may vary for initial informationgathering and doing the analysis based on the cause. As an example,initial acne determination may be based on the location, size, andseverity of skin features. When a cause or condition is determined(e.g., oily skin), the parameters may expand to include characteristicsof the cause or condition (e.g., how oily the skin is).

At step 810, the VPA may use the parameters to aid the user in capturingthe necessary best-focused images, as described above. In someembodiments, the VPA may include radiometric processing routines tocreate a mobile phone-based system for analyzing any physical surface.The domain-configured VPA may, in addition to performing other tasks,guide the user in positioning the phone's camera, providing feedback (inthe form of cues or instructions) to ensure the camera is in the rightposition and at the correct angle or position with respect to thephysical surface for optimal image capture and analysis quality. In oneembodiment of guiding the user to properly position the phone camera atdifferent locations on the body, such as cheek, forehead, chin, etc.,the VPA may announce the desired location. Starting from a referencepoint, the VPA may guide the user with audible beeps (e.g. faster beepsindicate the closeness to the desired location and continuous beep whenthe desired location is reached) and/or direct instruction (e.g. “tiltthe phone down”, “about an inch closer”). This is particularly usefulwhen the user is photographing a location where he cannot see thedisplay of the mobile device. At step 812, the VPA may obtain thebest-focused image and determine that it satisfies the requiredparameters. In some embodiments, the processor may determine that thedetermined best-focused image or certain required features/objectsis/are actually still blurry, or that the best-focused image does notcontain the region of interest. The VPA may convey the information thatthe photos must be retaken, and may help the user take the photos againas described above.

In another embodiment of interacting with the user to secure the image,the VPA may incorporate context and expressed user intent in analyzingimages. As an example, the user may respond to the VPA's prompt (step802) by saying, “take the photo of the rash,” indicating the user'sintent that the VPA analyze a rash on the user's skin. If the userspecifies his or her intent (e.g., as determined from the input in step806) or the VPA algorithmically infers the intent based on the context(e.g., as a component of determining the capture parameters at step808), the VPA can optimize image analysis (e.g., during verification ofthe best-focused image at step 812) to the intent. For example, the VPAmay evaluate the best-focused image to determine whether it can detectthe rash in the image. If the intent is not achieved, the VPA may deemthe image capture unsuccessful and further adjust feedback to alter theposition of the camera, as appropriate, for re-capture (step 810).

Once the image having the required quality level and depicting thefeatures of interest is obtained, it can be stored and the other imagesdiscarded (step 814). The best-focused image may be analyzed by the VPA(i.e., by a processor or an image analysis engine on the mobile deviceor remotely connected thereto). The analysis may include a comparison ofthe image to historical data, such as past-collected images of theregion of interest, to identify changes in the depicted features and/ortextured surface. Thus, for example, the user may periodicallyphotograph the region of interest during or after a treatment of a skincondition, and the VPA may determine the change in the skin conditionover time if the historical data exists. At step 816, the VPA maydetermine whether there is historical data pertaining to the identifiedcondition, region, or concern. For example, the VPA may access a datastore containing previously captured images of the subject, such imagesidentified (e.g., using metadata) by the date captured, concernaddressed, region or features depicted, etc. If there is no historicaldata, the VPA may perform other analyses (e.g., proceeding to step 818)or simply confirm to the user that the image is stored.

If there is historical data pertaining to the concern being evaluated,at step 820 the VPA may compare the image to the historical data. Forexample, the VPA may perform an image comparison of the newly capturedimage to one or more previous images and identify differences betweenthem. If the differences pertain to features that are relevant to theconcern being evaluated (e.g., disappearance, reduction in size, orchange in color of skin blemishes in the region of interest), the VPAmay identify the differences as changes in the condition. The changesmay identify the level of change, highlight the regions of change, mayalso overlay the quantitative and qualitative information. At step 822,the VPA may display the identified changes to the user. For example, theVPA may render to the display of the mobile device a side-by-side imageincluding the previous image and the new image. Additionally oralternatively, the VPA may render an annotated version of the new imagethat graphically identifies the changes, and may also include text or anaudio presentation describing the changes. In another embodiment, thedata store accessed in step 820 may include data that enables the VPA toidentify the ideal or target progress of the condition concerned, and atstep 822 the VPA may present to the user an evaluation of the treatmentprogress compared to the ideal or target progress. In some embodiments,the ideal condition may be provided as a general reference in therelevant industry. This could include the skin composition, oilinesslevel, the color of the skin, and other factors. In another embodiment,the new image and/or results of the comparison (step 820) may be sent toa server, a physician's or aesthetician's mobile device, or anothercomputing device of a human expert, who may then provide the resultsback to the user.

Returning to step 818, in some embodiments the VPA can analyze the skincondition to determine recommendations for procedures, medicines orother products, lifestyle changes, and other treatments for thecondition concerned. The VPA may provide the recommendations byconsidering the products being used, and suggest new products that couldhelp the user's health and well-being. If historical data was analyzed,the VPA may incorporate the results of the analysis when determining therecommendations. The VPA may, at step 830, present the recommendationsto the user via audio and/or video output of the mobile device. Thepresentation may include an explanation to the user of the data analyzedand the reasons that the recommendations are made. The system can alsobe a buddy that encourages the regular use of the product by providingrelated information.

FIG. 9 illustrates a logical arrangement of a set of general componentsof an example mobile device 100. In addition to the processor 102, thedevice logic 104, and the display 106 described above, the mobile device100 may include a memory component 900, which can include many types ofmemory, data storage, or non-transitory computer-readable storage media,such as data stores for program instructions, images and other data, aremovable memory for sharing information with other devices, etc. One ormore cameras 902 or other image sensors may capture image or videocontent. A camera can include, or be based at least in part upon anyappropriate technology, such as a CCD or CMOS image sensor having asufficient resolution, focal range, and/or viewable area, to capture animage of the user when the user is operating the device. An image sensorcan include a camera or infrared sensor that is able to image projectedimages or other objects in the vicinity of the device. It should beunderstood that image capture can be performed using a single image,multiple images, periodic imaging, continuous image capturing, imagestreaming, etc. Further, a device can include the ability to startand/or stop image capture, such as when receiving a command from a user,application, or other device. The mobile device 100 can similarlyinclude at least one audio component 904, such as a mono or stereomicrophone or microphone array, operable to capture audio informationfrom at least one primary direction. A microphone can be a uni-oromni-directional microphone as known for such devices.

The mobile device 100 also can include one or more orientation and/ormotion sensors 906. Such sensor(s) can include an accelerometer orgyroscope operable to detect an orientation and/or change inorientation, or an electronic or digital compass, which can indicate adirection in which the device is determined to be facing. Themechanism(s) also (or alternatively) can include or comprise a globalpositioning system (GPS) or similar positioning element operable todetermine relative coordinates for a position of the mobile device, aswell as information about relatively large movements of the device. Thedevice can include other elements as well, such as may enable locationdeterminations through triangulation or another such approach. Thesemechanisms can communicate with the processor 102, whereby the devicecan perform any of a number of actions such as detecting tilt as userinput.

The mobile device 100 includes various power components 908 known in theart for providing power to a mobile device, which can include capacitivecharging elements for use with a power pad or similar device. The mobiledevice can include one or more communication elements or networkingsub-systems 910, such as a Wi-Fi, Bluetooth, radio frequency (RF),wired, or wireless communication system. The device in many embodimentscan communicate with a network, such as the Internet, and may be able tocommunicate with other such devices. In some embodiments the device caninclude at least one additional input element 912 able to receiveconventional input from a user. This conventional input can include, forexample, a push button, touch pad, touchscreen, wheel, joystick,keyboard, mouse, keypad, or any other such component or element wherebya user can input a command to the device. In some embodiments, however,such a device might not include any buttons at all, and might becontrolled only through a combination of visual and audio commands, suchthat a user can control the device without having to be in contact withthe device.

The various embodiments further can be implemented in a wide variety ofoperating environments, which in some cases can include one or more usercomputers, computing devices or processing devices that can be used tooperate any of a number of applications. User or client devices caninclude any of a number of general purpose personal computers, such asdesktop, laptop or tablet computers running a standard operating system,as well as cellular, wireless and handheld devices running mobilesoftware and capable of supporting a number of networking and messagingprotocols. Such a system also can include a number of workstationsrunning any of a variety of commercially available operating systems andother known applications for purposes such as development and databasemanagement. These devices also can include other electronic devices,such as dummy terminals, thin-clients, gaming systems and other devicescapable of communicating via a network. These devices also can includevirtual devices such as virtual machines, hypervisors and other virtualdevices capable of communicating via a network.

Various embodiments of the present disclosure utilize a network thatwould be familiar to those skilled in the art for supportingcommunications using any of a variety of commercially- availableprotocols, such as Transmission Control Protocol/Internet Protocol(“TCP/IP”), User Datagram Protocol (“UDP”), protocols operating invarious layers of the Open System Interconnection (“OSI”) model, FileTransfer Protocol (“FTP”), Universal Plug and Play (“UpnP”), NetworkFile System (“NFS”), Common Internet File System (“CIFS”) and AppleTalk.The network can be, for example, a local area network, a wide-areanetwork, a virtual private network, the Internet, an intranet, anextranet, a public switched telephone network, an infrared network, awireless network, a satellite network, and any combination thereof.

In embodiments utilizing a web server, the web server can run any of avariety of server or mid-tier applications, including Hypertext TransferProtocol (“HTTP”) servers, FTP servers, Common Gateway Interface (“CGI”)servers, data servers, Java servers, Apache servers, and businessapplication servers. The server(s) also may be capable of executingprograms or scripts in response to requests from user devices, such asby executing one or more web applications that may be implemented as oneor more scripts or programs written in any programming language, such asJava®, C, C# or C++, or any scripting language, such as Ruby, PHP, Perl,Python or TCL, as well as combinations thereof. The server(s) may alsoinclude database servers, including those commercially available fromOracle®, Microsoft®, Sybase®, and IBM® as well as open-source serverssuch as MySQL, Postgres, SQLite, MongoDB, and any other server capableof storing, retrieving, and accessing structured or unstructured data.Database servers may include table-based servers, document-basedservers, unstructured servers, relational servers, non-relationalservers or combinations of these and/or other database servers.

The environment can include a variety of data stores and other memoryand storage media as discussed above. These can reside in a variety oflocations, such as on a storage medium local to (and/or resident in) oneor more of the computers or remote from any or all of the computersacross the network. In a particular set of embodiments, the informationmay reside in a storage-area network (“SAN”) familiar to those skilledin the art. Similarly, any necessary files for performing the functionsattributed to the computers, servers or other network devices may bestored locally and/or remotely, as appropriate. Where a system includescomputerized devices, each such device can include hardware elementsthat may be electrically coupled via a bus, the elements including, forexample, a central processing unit (“CPU” or “processor”), an inputdevice (e.g., a mouse, keyboard, controller, touch screen or keypad),and an output device (e.g., a display device, printer or speaker). Sucha system may also include one or more storage devices, such as diskdrives, optical storage devices and solid-state storage devices such asrandom access memory (“RAM”) or read-only memory (“ROM”), as well asremovable media devices, memory cards, flash cards, etc.

Such devices also can include a computer-readable storage media reader,a communications device (e.g., a modem, a wireless or wired networkcard, an infrared communication device, etc.), and working memory asdescribed above. The computer-readable storage media reader can beconnected with, or configured to receive, a computer-readable storagemedium, representing remote, local, fixed, and/or removable storagedevices as well as storage media for temporarily and/or more permanentlycontaining, storing, transmitting, and retrieving computer-readableinformation. The system and various devices also typically will includea number of software applications, modules, services, or other elementslocated within a working memory device, including an operating systemand application programs, such as a client application or web browser.It should be appreciated that alternate embodiments may have numerousvariations from that described above. For example, customized hardwaremight also be used and/or particular elements might be implemented inhardware, software (including portable software, such as applets) orboth. Further, connection to other computing devices such as networkinput/output devices may be employed.

Storage media and computer readable media for containing code, orportions of code, can include any appropriate media known or used in theart, including storage media and communication media, such as, volatileand non-volatile, removable and non-removable media implemented in anymethod or technology for storage and/or transmission of information suchas computer readable instructions, data structures, program modules orother data, including RAM, ROM, Electrically Erasable ProgrammableRead-Only Memory (“EEPROM”), flash memory or other memory technology,Compact Disc Read-Only Memory (“CD-ROM”), digital versatile disk (DVD)or other optical storage, magnetic cassettes, magnetic tape, magneticdisk storage or other magnetic storage devices or any other medium whichcan be used to store the desired information and which can be accessedby the system device. Based on the disclosure and teachings provided, aperson of ordinary skill in the art will appreciate other ways and/ormethods to implement the various embodiments.

ADDITIONAL EXAMPLES

Illustrative examples of the technologies disclosed herein are providedbelow. An embodiment of the technologies may include any one or more,and any combination of, the examples described below.

In an example 1, a method of capturing, using a mobile device, abest-focused image of a skin surface of a subject comprises setting acamera of the mobile device to a fixed focal length, capturing, usingthe camera, a current image of a plurality of images of the skinsurface, the plurality of images having a sequence and including a firstprevious image captured, using the camera, immediately previously to thecurrent image and a second previous image captured, using the camera,immediately previously to the first previous image, producing agrayscale image from the current image, transforming the grayscaleimage, using a Laplacian pyramid, to produce a plurality of firstluminance values from the grayscale image and a plurality of secondluminance values from the plurality of first luminance values, averaginga plurality of first squared values, each comprising a square of acorresponding first luminance value of the plurality of first luminancevalues, to produce a first energy value, averaging a plurality of secondsquared values, each comprising a square of a corresponding secondluminance value of the plurality of second luminance values, to producea second energy value, calculating a first ratio of the first energyvalue to the second energy value, calculating, as an average firstenergy value of the first previous image, an average of the first energyvalue, a corresponding first energy value of the first previous image,and a corresponding first energy value of the second previous image,calculating, as an average first ratio of the first previous image, anaverage of the first ratio, a corresponding first ratio of the firstprevious image, and a corresponding first ratio of the second previousimage, determining that the first previous image is one of a pluralityof valid images, wherein each valid image of the plurality of validimages is an image of the plurality of images and has a correspondingaverage first energy value above an energy threshold value and acorresponding average first ratio approximately equal to 1.0,determining that a first valid image of the plurality of valid images isthe best-focused image, wherein the first valid image has acorresponding average first energy value that is greater than thecorresponding average first energy values of a previous valid imagecaptured immediately before the first valid image and a subsequent validimage captured immediately after the first valid image, and performingan action associated with the best-focused image.

An example 2 includes the subject matter of example 1, wherein capturingthe current image comprises automatically capturing the plurality ofimages at a capture rate as the mobile device is continuously movedbetween a first point that is a first distance away from the skinsurface and a second point that is a second distance away from the skinsurface, the first distance being less than the focal length and thesecond distance being greater than the focal length and whereinperforming the action comprises stopping the capturing of the pluralityof images, such that the current image is the last captured image of theplurality of images, the method further comprising, before determiningthe best-focused image, determining that the plurality of valid imagesincludes at least a minimum number of the plurality of images.

An example 3 includes the subject matter of any of examples I and/or 2,wherein the skin surface is located such that a display of the mobiledevice is not viewable by the subject while the subject is moving themobile device during capture of the plurality of images, and capturingthe plurality of images comprises determining, using an accelerometer ofthe mobile device, that a speed at which the mobile device is beingmoved exceeds a predetermined speed limit and producing, using themobile device, an audible alert that indicates to the subject to slowmovement of the mobile device, wherein performing the action furthercomprises producing, using the mobile device, an audible indication thatthe best-focused image has been captured.

An example 4 includes the subject matter of any of examples 1, 2, and/or3, wherein the method further includes the steps of further transformingthe grayscale image, using the Laplacian pyramid, to produce a pluralityof third luminance values from the plurality of second luminance values,averaging a plurality of third squared values, each comprising a squareof a corresponding third luminance value of the plurality of thirdluminance values, to produce a third energy value, calculating a secondratio of the first energy value to the third energy value, calculating,as an average second energy value of the first previous image, anaverage of the second energy value, a corresponding second energy valueof the first previous image, and a corresponding second energy value ofthe second previous image, calculating, as an average third energy valueof the first previous image, an average of the third energy value, acorresponding third energy value of the first previous image, and acorresponding third energy value of the second previous image, andcalculating, as an average second ratio of the first previous image, anaverage of the second ratio, a corresponding second ratio of the firstprevious image, and a corresponding second ratio of the second previousimage, wherein each valid image of the plurality of valid images furtherhas a corresponding average second energy value and a correspondingaverage third energy value both above the energy threshold value and acorresponding average second ratio approximately equal to 1.0.

In an example 5, a method of determining, using a mobile device, abest-focused image of a textured surface comprises receiving a pluralityof images of the textured surface, the plurality of images captured by acamera of the mobile device while the camera is set to a fixed focallength, applying a Laplacian pyramid to a first image of the pluralityof images to generate a Laplacian pyramid having a first level based onthe first image and a second level based on the first level, determininga first energy value of the first image, the first energy valuerepresenting a Laplacian energy of the first level, determining a secondenergy value of the first image, the second energy value representing aLaplacian energy of the second level, determining that the first energyvalue exceeds an energy threshold and is approximately equal to thesecond energy value, determining, based at least in part on the firstenergy value of the first image and a corresponding first energy valueof each valid image of a plurality of valid images, that the first imageis the best-focused image, the plurality of valid images comprising allof the plurality of images having a corresponding first energy valuethat exceeds the energy threshold and is approximately equal to acorresponding second energy value, and performing an action associatedwith the best-focused image.

An example 6 includes the subject matter of example 5, wherein receivingthe plurality of images comprises controlling the camera to capture theplurality of images at a capture rate as the mobile device iscontinuously moved between a first point that is a first distance awayfrom the textured surface and a second point that is a second distanceaway from the textured surface, the first distance being less than thefocal length and the second distance being greater than the focallength.

An example 7 includes the subject matter of an of examples 5 and/or 6,wherein receiving the plurality of images further comprises determiningone or more conditions of the mobile device and producing, using themobile device, one or more audible cues based on the one or moreconditions, the one or more audible cues aiding a user of the mobiledevice to position the mobile device for capturing one or more of theplurality of images.

An example 8 includes the subject matter of any of examples 5, 6, and/or7, wherein determining the one or more conditions comprises determining,using an accelerometer of the mobile device, that a speed at which themobile device is being moved exceeds a predetermined speed limit andwherein producing the one or more audible cues comprises producing anaudible alert that indicates to the user to slow movement of the mobiledevice.

An example 9 includes the subject matter of any of examples 5, 6, 7,and/or 8, wherein the plurality of images includes a current imagecaptured immediately subsequently to the first image, and whereinperforming the action comprises controlling the camera to stop capturingthe plurality of images such that the current image is captured last ofthe plurality of images.

An example 10 includes the subject matter of any of examples 5, 6, 7, 8,and/or 9, wherein determining that the first image is the best-focusedimage comprises calculating an average first energy value fromcorresponding first energy values of the first image, the current image,and a previous image of the plurality of images, the previous imagebeing captured immediately previously to the first image, calculating anaverage first ratio from of a plurality of first ratios, each associatedwith a corresponding image of the plurality of images and comparing acorresponding first energy value of the image to a corresponding secondenergy value of the image, the plurality of first ratios including acorresponding first ratio for each of the first image, the previousimage, and the current image, determining that the average first energyvalue exceeds the energy threshold and the average first ratio isapproximately 1.0, and determining that the average first energy valueof the first image is greater than a corresponding average first energyvalue of each of a previous valid image of the plurality of validimages, the previous valid image captured immediately before the firstimage and a subsequent valid image of the plurality of valid images, thesubsequent valid image captured immediately after the first image.

An example 11 includes the subject matter of any of examples 5, 6, 7, 8,9, and/or 10, wherein applying the Laplacian pyramid comprises selectingone or more parameters of the Laplacian pyramid such that a firstLaplacian pyramid based on a first unfocused image captured when themobile device is a first distance away from the textured surface, thefirst distance being less than the focal length, exhibits acorresponding first level and a corresponding second level in which aLaplacian energy of the first level is higher than and not approximatelyequal to a Laplacian energy of second level and a second Laplacianpyramid based on a second unfocused image captured when the mobiledevice is a second distance away from the textured surface, the seconddistance being greater than the focal length, exhibits a correspondingfirst level in which a Laplacian energy of the first level does notexceed the energy threshold.

An example 12 includes the subject matter of any of examples 5, 6, 7, 8,9, 10, and/or 11, wherein performing the action comprises determiningthat a blurriness of the first image exceeds a threshold blurriness andproducing, using the mobile device, a cue indicating that the pluralityof images should be discarded and a plurality of new images of thetextured surface must be captured.

An example 13 includes the subject matter of any of examples 5, 6, 7, 8,9, 10, 11, and/or 12, wherein the textured surface is a skin surface ofa subject, the skin surface having a base appearance and one or moreweak features that each are not significantly differentiated from thebase appearance or from the other one or more weak features, and whereinperforming the action comprises detecting the one or more weak featuresin the best-focused image and generating information about the one ormore weak features.

An example 14 includes the subject matter of any of examples 5, 6, 7, 8,9, 10, 11, 12, and/or 13, wherein the one or more weak features areselected from the group comprising furrows, wrinkles, and pores.

In an example 15, a mobile device comprises a camera set to a focallength, memory storing device logic and a Laplacian pyramid, and aprocessor in electronic communication with the memory and the camera,the processor executing the device logic to receive, from the camera, aplurality of images of a region of interest, apply the discrete Laplaceoperator to each image to generate a corresponding Laplacian pyramidhaving a first level based on the image and a second level based on thefirst level, determine, for each image of the plurality of images, acorresponding first energy value representing Laplacian energy of thefirst level of the corresponding pyramid, determine, for each image ofthe plurality of images, a corresponding second energy valuerepresenting Laplacian energy of the second level of the correspondingpyramid, determine a best-focused image of the plurality of images basedat least in part on the corresponding first energy value of each validimage of a plurality of valid images including the best-focused image,wherein each valid image of the plurality of valid images is one of theplurality of images and has a first energy value that exceeds an energythreshold and is approximately equal to a second energy value, andperform an action based on the best-focused image.

An example 16 includes the subject matter of example 15, wherein thediscrete Laplace operator comprises a high-pass filtering kernel of apredetermined size and a predetermined frequency, the predetermined sizeand the predetermined frequency selected such that a first Laplacianpyramid for a first unfocused image captured when the mobile device is afirst distance away from a subject, the first distance being less thanthe focal length, exhibits a corresponding first level and acorresponding second level in which a Laplacian energy of the firstlevel is higher than and not approximately equal to a Laplacian energyof second level and a second Laplacian pyramid based on a secondunfocused image captured when the mobile device is a second distanceaway from the subject, the second distance being greater than the focallength, exhibits a corresponding first level in which a Laplacian energyof the first level does not exceed the energy threshold.

An example 17 includes the subject matter of any of examples 15 and/or16, further comprising an autofocus motor, wherein the focal length ofthe camera is adjustable by the autofocus motor from a minimum focallength to a maximum focal length, and wherein to receive the pluralityof images, the processor executes the device logic to fix the focallength at the minimum focal length, and to disable the autofocus motor,during capture by the camera of the plurality of images.

An example 18, includes the subject matter of any of examples 15, 16,and/or 17, further comprising a range sensor in electronic communicationwith the processor, wherein the region of interest includes a texturedsurface and to receive the plurality of images, the processor executesthe device logic to receive a signal from the range sensor, determinethat the signal indicates that the range sensor detected that the mobiledevice is positioned at a first point that is a first distance away fromthe textured surface, the first distance being less than the focallength, and control the camera to capture the plurality of images at acapture rate as the mobile device is continuously moved from the firstpoint to a second point that is a second distance away from the texturedsurface, the second distance being greater than the focal length.

An example 19, includes the subject matter of any of examples 15, 16,17, and/or 18, wherein the processor receives each image of theplurality of images as the image is captured, such that the plurality ofimages has a first sequence and the plurality of valid images has asecond sequence comprising the first sequence with each image of theplurality of images that is not one of the plurality of valid imagesremoved and wherein each valid image of the plurality of valid imagesfurther has a corresponding average first energy value and acorresponding average second energy value that exceed the energythreshold and are approximately equal, the processor executing thedevice logic to calculate the corresponding average first energy valuefrom the corresponding first energy values of the valid image, aprevious image positioned immediately before the valid image in thefirst sequence, and a subsequent image positioned immediately after thevalid image in the first sequence, calculate the corresponding averagesecond energy value from the corresponding second energy values of thevalid image, the previous image, and the subsequent image and determinethat the corresponding average first energy value exceeds the energythreshold and is approximately equal to the corresponding average secondenergy value, wherein to determine the best-focused image, the processorexecutes the device logic to determine that the corresponding averagefirst energy value of a first image of the plurality of valid images isgreater than the corresponding average first energy value of each of aprevious valid image of the plurality of valid images, the previousvalid image positioned immediately before the first image in the secondsequence and a subsequent valid image of the plurality of valid images,the subsequent valid image positioned immediately after the first imagein the second sequence, wherein to perform the action, the processorexecutes the device logic to control the camera to stop capturing theplurality of images.

An example 20, includes the subject matter of any of examples 15, 16,17, 18, and/or 19, further comprising a speaker, wherein to receive theplurality of images, the processor executes the device logic todetermine one or more conditions of the mobile device, generate one ormore audible cues based on the one or more conditions, the one or moreaudible cues aiding a user of the mobile device to position the mobiledevice for capturing one or more of the plurality of images, and outputthe one or more audible cues via the speaker.

What is claimed is:
 1. A method of capturing, using a mobile device, abest-focused image of a skin surface of a subject, the methodcomprising: setting a camera of the mobile device to a fixed focallength; capturing, using the camera, a current image of a plurality ofimages of the skin surface, the plurality of images having a sequenceand including a first previous image captured, using the camera,previously to the current image and a second previous image captured,using the camera, previously to the first previous image; producing amodified image from the current image; transforming the modified image,using a Laplacian pyramid, to produce a plurality of first luminancevalues from the modified image and a plurality of second luminancevalues from the plurality of first luminance values; averaging aplurality of first squared values, each comprising a square of acorresponding first luminance value of the plurality of first luminancevalues, to produce a first energy value; averaging a plurality of secondsquared values, each comprising a square of a corresponding secondluminance value of the plurality of second luminance values, to producea second energy value; calculating a first ratio of the first energyvalue to the second energy value; calculating, as an average firstenergy value of the first previous image, an average of the first energyvalue, a corresponding first energy value of the first previous image,and a corresponding first energy value of the second previous image;calculating, as an average first ratio of the first previous image, anaverage of the first ratio, a corresponding first ratio of the firstprevious image, and a corresponding first ratio of the second previousimage; determining that the first previous image is one of a pluralityof valid images, wherein each valid image of the plurality of validimages is an image of the plurality of images and has: a correspondingaverage first energy value above an energy threshold value; and acorresponding average first ratio approximately equal to 1.0;determining that a first valid image of the plurality of valid images isthe best-focused image, wherein the first valid image has acorresponding average first energy value that is greater than thecorresponding average first energy values of: a previous valid imagecaptured immediately before the first valid image; and a subsequentvalid image captured immediately after the first valid image; andperforming an action associated with the best-focused image.
 2. Themethod of claim 1, wherein capturing the current image comprisesautomatically capturing the plurality of images at a capture rate as themobile device is continuously moved between a first point that is afirst distance away from the skin surface and a second point that is asecond distance away from the skin surface, the first distance beingless than the focal length and the second distance being greater thanthe focal length; and wherein performing the action comprises stoppingthe capturing of the plurality of images, such that the current image isthe last captured image of the plurality of images; the method furthercomprising, before determining the best-focused image, determining thatthe plurality of valid images includes at least a minimum number of theplurality of images.
 3. The method of claim 2, wherein the skin surfaceis located such that a display of the mobile device is not viewable bythe subject while the subject is moving the mobile device during captureof the plurality of images, and capturing the plurality of imagescomprises: determining, using an accelerometer of the mobile device,that a speed at which the mobile device is being moved exceeds apredetermined speed limit; and producing, using the mobile device, anaudible alert that indicates to the subject to slow movement of themobile device; and wherein performing the action further comprisesproducing, using the mobile device, an audible indication that thebest-focused image has been captured.
 4. The method of claim 1, furthercomprising: further transforming the modified image, using the Laplacianpyramid, to produce a plurality of third luminance values from theplurality of second luminance values; averaging a plurality of thirdsquared values, each comprising a square of a corresponding thirdluminance value of the plurality of third luminance values, to produce athird energy value; calculating a second ratio of the first energy valueto the third energy value; calculating, as an average second energyvalue of the first previous image, an average of the second energyvalue, a corresponding second energy value of the first previous image,and a corresponding second energy value of the second previous image;calculating, as an average third energy value of the first previousimage, an average of the third energy value, a corresponding thirdenergy value of the first previous image, and a corresponding thirdenergy value of the second previous image; and calculating, as anaverage second ratio of the first previous image, an average of thesecond ratio, a corresponding second ratio of the first previous image,and a corresponding second ratio of the second previous image; whereineach valid image of the plurality of valid images further has: acorresponding average second energy value and a corresponding averagethird energy value both above the energy threshold value; and acorresponding average second ratio approximately equal to 1.0.
 5. Amethod of determining, using a mobile device, a best-focused image of atextured surface, the method comprising: receiving a plurality of imagesof the textured surface, the plurality of images captured by a camera ofthe mobile device while the camera is set to a fixed focal length;applying a Laplacian pyramid to a first image of the plurality of imagesto generate a Laplacian pyramid having a first level based on the firstimage and a second level based on the first level; determining a firstenergy value of the first image, the first energy value representing aLaplacian energy of the first level; determining a second energy valueof the first image, the second energy value representing a Laplacianenergy of the second level; determining that the first energy valueexceeds an energy threshold and is approximately equal to the secondenergy value; determining, based at least in part on the first energyvalue of the first image and a corresponding first energy value of eachvalid image of a plurality of valid images, that the first image is thebest-focused image, the plurality of valid images comprising all of theplurality of images having a corresponding first energy value thatexceeds the energy threshold and is approximately equal to acorresponding second energy value; and performing an action associatedwith the best-focused image.
 6. The method of claim 5, wherein receivingthe plurality of images comprises controlling the camera to capture theplurality of images at a capture rate as the mobile device iscontinuously moved between a first point that is a first distance awayfrom the textured surface and a second point that is a second distanceaway from the textured surface, the first distance being less than thefocal length and the second distance being greater than the focallength.
 7. The method of claim 6, wherein receiving the plurality ofimages further comprises: determining one or more conditions of themobile device; and producing, using the mobile device, one or moreaudible cues based on the one or more conditions, the one or moreaudible cues aiding a user of the mobile device to position the mobiledevice for capturing one or more of the plurality of images.
 8. Themethod of claim 7, wherein determining the one or more conditionscomprises determining, using an accelerometer of the mobile device, thata speed at which the mobile device is being moved exceeds apredetermined speed limit; and wherein producing the one or more audiblecues comprises producing an audible alert that indicates to the user toslow movement of the mobile device.
 9. The method of claim 6, whereinthe plurality of images includes a current image captured immediatelysubsequently to the first image, and wherein performing the actioncomprises controlling the camera to stop capturing the plurality ofimages such that the current image is captured last of the plurality ofimages.
 10. The method of claim 9, wherein determining that the firstimage is the best-focused image comprises: calculating an average firstenergy value from corresponding first energy values of the first image,the current image, and a previous image of the plurality of images, theprevious image being captured immediately previously to the first image;calculating an average first ratio from of a plurality of first ratios,each associated with a corresponding image of the plurality of imagesand comparing a corresponding first energy value of the image to acorresponding second energy value of the image, the plurality of firstratios including a corresponding first ratio for each of the firstimage, the previous image, and the current image; determining that theaverage first energy value exceeds the energy threshold and the averagefirst ratio is approximately 1.0; and determining that the average firstenergy value of the first image 1s greater than a corresponding averagefirst energy value of each of: a previous valid image of the pluralityof valid images, the previous valid image captured immediately beforethe first image; and a subsequent valid image of the plurality of validimages, the subsequent valid image captured immediately after the firstimage.
 11. The method of claim 5, wherein applying the Laplacian pyramidcomprises selecting one or more parameters of the Laplacian pyramid suchthat: a first Laplacian pyramid based on a first unfocused imagecaptured when the mobile device is a first distance away from thetextured surface, the first distance being less than the focal length,exhibits a corresponding first level and a corresponding second level inwhich a Laplacian energy of the first level is higher than and notapproximately equal to a Laplacian energy of second level; and a secondLaplacian pyramid based on a second unfocused image captured when themobile device is a second distance away from the textured surface, thesecond distance being greater than the focal length, exhibits acorresponding first level in which a Laplacian energy of the first leveldoes not exceed the energy threshold.
 12. The method of claim 5, whereinperforming the action comprises: determining that a blurriness of thefirst image exceeds a threshold blurriness; and producing, using themobile device, a cue indicating that the plurality of images should bediscarded and a plurality of new images of the textured surface must becaptured.
 13. The method of claim 5, wherein the textured surface is askin surface of a subject, the skin surface having a base appearance andone or more weak features that each are not significantly differentiatedfrom the base appearance or from the other one or more weak features,and wherein performing the action comprises detecting the one or moreweak features in the best-focused image and generating information aboutthe one or more weak features.
 14. The method of claim 13, wherein theone or more weak features are selected from the group comprisingfurrows, wrinkles, and pores.
 15. A mobile device comprising: a cameraset to a focal length; memory storing device logic and a Laplacianpyramid; and a processor in electronic communication with the memory andthe camera, the processor executing the device logic to: receive, fromthe camera, a plurality of images of a region of interest; apply theLaplacian pyramid to each image to generate a corresponding Laplacianpyramid having a first level based on the image and a second level basedon the first level; determine, for each image of the plurality ofimages, a corresponding first energy value representing Laplacian energyof the first level of the corresponding pyramid; determine, for eachimage of the plurality of images, a corresponding second energy valuerepresenting Laplacian energy of the second level of the correspondingpyramid; determine a best-focused image of the plurality of images basedat least in part on the corresponding first energy value of each validimage of a plurality of valid images including the best-focused image,wherein each valid image of the plurality of valid images is one of theplurality of images and has a first energy value that exceeds an energythreshold and is approximately equal to a second energy value; andperform an action based on the best-focused image.
 16. The mobile deviceof claim 15, wherein the Laplacian pyramid comprises a high-passfiltering kernel of a predetermined size and a predetermined frequency,the predetermined size and the predetermined frequency selected suchthat: a first Laplacian pyramid for a first unfocused image capturedwhen the mobile device is a first distance away from a subject, thefirst distance being less than the focal length, exhibits acorresponding first level and a corresponding second level in which aLaplacian energy of the first level is higher than and not approximatelyequal to a Laplacian energy of second level; and a second Laplacianpyramid based on a second unfocused image captured when the mobiledevice is a second distance away from the subject, the second distancebeing greater than the focal length, exhibits a corresponding firstlevel in which a Laplacian energy of the first level does not exceed theenergy threshold.
 17. The mobile device of claim 15, further comprisingan autofocus motor, wherein the focal length of the camera is adjustableby the autofocus motor from a minimum focal length to a maximum focallength, and wherein to receive the plurality of images, the processorexecutes the device logic to fix the focal length at the minimum focallength, and to disable the autofocus motor, during capture by the cameraof the plurality of images.
 18. The mobile device of claim 15, furthercomprising a range sensor m electronic communication with the processor,wherein the region of interest includes a textured surface and toreceive the plurality of images, the processor executes the device logicto: receive a signal from the range sensor; determine that the signalindicates that the range sensor detected that the mobile device ispositioned at a first point that is a first distance away from thetextured surface, the first distance being less than the focal length;and control the camera to capture the plurality of images at a capturerate as the mobile device is continuously moved from the first point toa second point that is a second distance away from the textured surface,the second distance being greater than the focal length.
 19. The mobiledevice of claim 18, wherein the processor receives each image of theplurality of images as the image is captured, such that the plurality ofimages has a first sequence and the plurality of valid images has asecond sequence comprising the first sequence with each image of theplurality of images that is not one of the plurality of valid imagesremoved; wherein each valid image of the plurality of valid imagesfurther has a corresponding average first energy value and acorresponding average second energy value that exceed the energythreshold and are approximately equal, the processor executing thedevice logic to: calculate the corresponding average first energy valuefrom the corresponding first energy values of the valid image, aprevious image positioned immediately before the valid image in thefirst sequence, and a subsequent image positioned immediately after thevalid image in the first sequence; calculate the corresponding averagesecond energy value from the corresponding second energy values of thevalid image, the previous image, and the subsequent image; and determinethat the corresponding average first energy value exceeds the energythreshold and is approximately equal to the corresponding average secondenergy value; wherein to determine the best-focused image, the processorexecutes the device logic to: determine that the corresponding averagefirst energy value of a first image of the plurality of valid images isgreater than the corresponding average first energy value of each of: aprevious valid image of the plurality of valid images, the previousvalid image positioned immediately before the first image in the secondsequence; and a subsequent valid image of the plurality of valid images,the subsequent valid image positioned immediately after the first imagein the second sequence; and wherein to perform the action, the processorexecutes the device logic to control the camera to stop capturing theplurality of images.
 20. The mobile device of claim 15, furthercomprising a speaker, wherein to receive the plurality of images, theprocessor executes the device logic to: determine one or more conditionsof the mobile device; and generate one or more audible cues based on theone or more conditions, the one or more audible cues aiding a user ofthe mobile device to position the mobile device for capturing one ormore of the plurality of images; and output the one or more audible cuesvia the speaker.